Search CORE

1,802 research outputs found

Automatic speech recognition system development in the “wild“

Author: Gales MJF
Ragni A
Publication venue: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication date: 01/01/2018
Field of study

The standard framework for developing an automatic speech recognition (ASR) system is to generate training and development data for building the system, and evaluation data for the final performance analysis. All the data is assumed to come from the domain of interest. Though this framework is matched to some tasks, it is more challenging for systems that are required to operate over broad domains, or where the ability to collect the required data is limited. This paper discusses ASR work performed under the IARPA MATERIAL program, which is aimed at cross-language information retrieval, and examines this challenging scenario. In terms of available data, only limited narrow-band conversational telephone speech data was provided. However, the system is required to operate over a range of domains, including broadcast data. As no data is available for the broadcast domain, this paper proposes an approach for system development based on scraping "related" data from the web, and using ASR system confidence scores as the primary metric for developing the acoustic and language model components. As an initial evaluation of the approach, the Swahili development language is used, with the final system performance assessed on the IARPA MATERIAL Analysis Pack 1 data.The Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via Air Force Research Laboratory (AFRL

Crossref

Apollo (Cambridge)

White Rose Research Online

Recommended from our members

Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks

Author: Gales Mark
Kastanos A
Ragni A
Publication venue: 'Organisation for Economic Co-Operation and Development (OECD)'
Publication date: 01/05/2020
Field of study

Apollo (Cambridge)

Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks

Author: Gales MJF
Kastanos A
Ragni A
Publication venue: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publication date: 15/03/2020
Field of study

Recently, there has been growth in providers of speech transcription services enabling others to leverage technology they would not normally be able to use. As a result, speech-enabled solutions have become commonplace. Their success critically relies on the quality, accuracy, and reliability of the underlying speech transcription systems. Those black box systems, however, offer limited means for quality control as only word sequences are typically available. This paper examines this limited resource scenario for confidence estimation, a measure commonly used to assess transcription reliability. In particular, it explores what other sources of word and sub-word level information available in the transcription process could be used to improve confidence scores. To encode all such information this paper extends lattice recurrent neural networks to handle sub-words. Experimental results using the IARPA OpenKWS 2016 evaluation system show that the use of additional information yields significant gains in confidence estimation accuracy. The implementation for this model can be found online.Comment: 5 pages, 8 figures, ICASSP submissio

arXiv.org e-Print Archive

Crossref

Apollo (Cambridge)

Recommended from our members

Unicode-based graphemic systems for limited resource languages

Author: Gales MJF
Knill KM
Ragni A
Publication venue: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publication date: 01/01/2015
Field of study

© 2015 IEEE. Large vocabulary continuous speech recognition systems require a mapping from words, or tokens, into sub-word units to enable robust estimation of acoustic model parameters, and to model words not seen in the training data. The standard approach to achieve this is to manually generate a lexicon where words are mapped into phones, often with attributes associated with each of these phones. Contextdependent acoustic models are then constructed using decision trees where questions are asked based on the phones and phone attributes. For low-resource languages, it may not be practical to manually generate a lexicon. An alternative approach is to use a graphemic lexicon, where the 'pronunciation' for a word is defined by the letters forming that word. This paper proposes a simple approach for building graphemic systems for any language written in unicode. The attributes for graphemes are automatically derived using features from the unicode character descriptions. These attributes are then used in decision tree construction. This approach is examined on the IARPA Babel Option Period 2 languages, and a Levantine Arabic CTS task. The described approach achieves comparable, and complementary, performance to phonetic lexicon-based approaches

Apollo (Cambridge)

White Rose Research Online

Recommended from our members

A language space representation for speech recognition

Author: Gales MJF
Knill KM
Ragni A
Publication venue: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publication date: 01/01/2015
Field of study

© 2015 IEEE. The number of languages for which speech recognition systems have become available is growing each year. This paper proposes to view languages as points in some rich space, termed language space, where bases are eigen-languages and a particular selection of the projection determines points. Such an approach could not only reduce development costs for each new language but also provide automatic means for language analysis. For the initial proof of the concept, this paper adopts cluster adaptive training (CAT) known for inducing similar spaces for speaker adaptation needs. The CAT approach used in this paper builds on the previous work for language adaptation in speech synthesis and extends it to Gaussian mixture modelling more appropriate for speech recognition. Experiments conducted on IARPA Babel program languages show that such language space representations can outperform language independent models and discover closely related languages in an automatic way

Apollo (Cambridge)

White Rose Research Online

Low-resource speech recognition and keyword-spotting

Author: Gales MJF
Knill KM
Ragni A
Publication venue: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publication date: 01/01/2017
Field of study

© Springer International Publishing AG 2017. The IARPA Babel program ran from March 2012 to November 2016. The aim of the program was to develop agile and robust speech technology that can be rapidly applied to any human language in order to provide effective search capability on large quantities of real world data. This paper will describe some of the developments in speech recognition and keyword-spotting during the lifetime of the project. Two technical areas will be briefly discussed with a focus on techniques developed at Cambridge University: the application of deep learning for low-resource speech recognition; and efficient approaches for keyword spotting. Finally a brief analysis of the Babel speech language characteristics and language performance will be presented

Crossref

Apollo (Cambridge)

White Rose Research Online

Hyperspectral imaging to measure apricot attributes during storage

Author: Benelli A.
Cevoli C.
Fabbri A.
Ragni L.
Publication venue
Publication date: 01/01/2022
Field of study

The fruit industry needs rapid and non-destructive techniques to evaluate the quality of the products in the field and during the post-harvest phase. The soluble solids content (SSC), in terms of °Brix, and the flesh firmness (FF) are typical parameters used to measure fruit quality and maturity state. Hyperspectral imaging (HSI) is a powerful technique that combines image analysis and infrared spectroscopy. This study aimed to evaluate the potential of the application of the Vis/NIR push-broom hyperspectral imaging (400 to 1000 nm) to predict the firmness and the °Brix in apricots (180 samples) during storage (11 days). Partial least squares (PLS) and artificial neural networks (ANN) were used to develop predictive models. For the PLS, R2 values (test set) up to 0.85 (RMSEP=1.64 N) and 0.72 (RMSEP=0.51 °Brix) were obtained for the FF and SSC, respectively. Concerning the ANN, the best results in the test set were achieved for the FF (R2=0.85, RMSEP=1.50 N). The study showed the potential of the HSI technique as a non-destructive tool for measuring apricot quality even along the whole supply chain

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Incorporating uncertainty into deep learning for spoken language assessment

Author: Gales MJF
Knill KM
Malinin A
Ragni A
Publication venue: ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
Publication date: 01/01/2017
Field of study

There is a growing demand for automatic assessment of spoken English proficiency. These systems need to handle large vari- ations in input data owing to the wide range of candidate skill levels and L1s, and errors from ASR. Some candidates will be a poor match to the training data set, undermining the validity of the predicted grade. For high stakes tests it is essen- tial for such systems not only to grade well, but also to provide a measure of their uncertainty in their predictions, en- abling rejection to human graders. Pre- vious work examined Gaussian Process (GP) graders which, though successful, do not scale well with large data sets. Deep Neural Networks (DNN) may also be used to provide uncertainty using Monte-Carlo Dropout (MCD). This paper proposes a novel method to yield uncertainty and compares it to GPs and DNNs with MCD. The proposed approach explicitly teaches a DNN to have low uncertainty on train- ing data and high uncertainty on generated artificial data. On experiments conducted on data from the Business Language Test- ing Service (BULATS), the proposed ap- proach is found to outperform GPs and DNNs with MCD in uncertainty-based re- jection whilst achieving comparable grad- ing performance

Crossref

Apollo (Cambridge)

White Rose Research Online

Inflectional loci of scrolls

Author: A. Lanteri
Antonio Lanteri
P. Ionescu
R. Mallavibarrena
R. Piene
Ragni Piene
Raquel Mallavibarrena
T. Shifrin
W. Fulton
Publication venue
Publication date: 01/01/2006
Field of study

Let

X\subset \mathbb P^N

be a scroll over a smooth curve

C

and let \L=\mathcal O_{\mathbb P^N}(1)|_X denote the hyperplane bundle. The special geometry of

X

implies that some sheaves related to the principal part bundles of \L are locally free. The inflectional loci of

X

can be expressed in terms of these sheaves, leading to explicit formulas for the cohomology classes of the loci. The formulas imply that the only uninflected scrolls are the balanced rational normal scrolls.Comment: 9 pages, improved version. Accepted in Mathematische Zeitschrif

arXiv.org e-Print Archive

CiteSeerX

Docta Complutense

Crossref

AIR Universita degli studi di Milano